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THE INTEGRATION OF SYSTEM SPECIFICATIONS 
AND PROGRAM CODING 

William R. Luebke 
Computer Sciences Corp. 


This report is a description of experience in maintaining up-to-date documentation for 
one module of the large-scale Medical Literature Analysis and Retrieval System 11 (MEDLARS 
II). Several innovative techniques have been explored in the development of this system’s 
data management environment, particularly those that use PL/I as an automatic documenter. 
The PL/I data description section can provide automatic documentation by means of a 
master description of data elements that has long and highly meaningful mnemonic names 
and a formalized technique for the production of descriptive commentary. The techniques 
to be discussed are not common or unusual but are, instead, practical methods that employ 
the computer during system development in a manner that assists system implementation, 
provides interim documentation for customer review, and satisfies some of the deliverable 
documentation requirements. 

DOCUMENTATION THROUGH DATA DEFINITION 

MEDLARS II is a very complex system involving more than 50 PL/I programs that 
must share approximately 500 separate data variables. Most of the programmers assigned to 
the implementation phase had only limited experience in PL/I programming, and only a few 
had participated in the design of the system. The delivery schedule required that coding be- 
gin before the entire design had been completed and thoroughly reviewed. Therefore, an 
efficient mechanism for communicating the original design and any modifications of it to 
the programmers was essential. 

With the number of programmers involved, each having responsibility for and knowl- 
edge of only a segment of the system, it was necessary to have ways to guarantee consistency 
in data definition and usage. The PL/I language has features that support these objectives. 

The data declaration statement in PL/I was chosen as the basis for design documentation to 
achieve data consistency, minimize coding and keypunching, and establish a basis for 
computer-produced portions of the final system documentation. 

The DECLARE statement is a compiler instruction that defines the attributes and re- 
lationships of data variables. The data in the system are arranged in over 30 separate data 
structures or arrays of structures, each defined by a DECLARE statement. Figure 1 shows 
a portion of one of these structures, PRINT_FORMAT_TABLE. Line 1 names the structure 
and gives its attributes. The lines beginning with the number 2 name the data variables in 
the structure and establish their individual attributes. Comments in the PL/I language are 
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Figure 1.— Sample data declaration. 
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delimited by /* */. Notice that most of the DECLARE statement is comprised of com- 

ments. It is through the comments that the system’s programs are related to the data. 

The comments following the first line on the right side of the figure identify the pro- 
grams that acquire storage or free storage (the latter indicated by parentheses) for the data 
structure. Below this is a comment paragraph that discusses the occurrence of the table 
within the system. Each data element in the structure is named on a line beginning with the 
number 2. Notice that long descriptive names are used in the declaration. PL/I permits up to 
31 characters. At the right edge of a data-element line are comments that identify which 
programs set a value for that data element. For example, PFTS assigns a value to LINE _ 
WIDTH. Below the data-element name are comment lines that identify the programs which 
use the data element’s value. In the first case, the RR signifies one of the four phases of 
processing. In the RR phase, one routine (PUF) uses the element. In the C phase of process- 
ing, two routines (PUF and PCD) use the data element. The same type of information ap- 
pears for all the data elements in all data structures. 

Each programmer has a book of all the data structures used by the system. The book 
is produced by the PL/I compiler, and, in addition to the DECLARE statements, it contains 
an alphabetical listing of all data elements in the system. This listing, a normal compiler prod- 
uct, identifies the attributes of the element and the structure in which it appears. 

The comments in the declaration are rigidly defined by position to permit simple manip- 
ulation of these data by a computer program. At present, a PL/I program processes all the 
declarations as input data and produces a listing of data involvement for each program. Fig- 
ure 2 is an example of this program’s output. The designer is now able to express any addi- 
tions or modifications to the system design through changes in the declarations. These changes 
are quickly communicated to the programmer by means of the program data involvement 
sheets. Both the DECLARE statements and the program data involvement sheets will be part 
of the final documentation of the system. 

There is another feature of the PL/I compiler that has been a very valuable aid in the 
use of the declarations. Before program compilation, a preprocessor scan is made of the 
compiler input, the source deck. The preprocessor phase permits inclusion of data from li- 
braries in the system and certain procedural operations to take place before compilation. 
Presently, the declarations are being cataloged into the source library under a member name 
algorithmically derived from the name of the data structure. The preprocessor system is as- 
signed two codes (LR and LP) to identify it within the system. LR is interpreted as data; 

LP, as procedure. The declaration statements are cataloged in the library with an LR prefix, 
followed by the first letter of each word in the name of the table. LRPFT is the library mem- 
ber name of the PRINT_FORMAT_TABLE. When programs are cataloged, their names are 
preceded by LP. For example, on the first line of the example declaration (fig. 2), it is indi- 
cated that the core area for the table was acquired by a routine named PTSD. In the library 
and in coding, the actual name of the routine is LPPTSD. 

To gain access to the material in the source library, the programmer writes a %INCLUDE 
statement in his code, naming the member name of the table (e.g., %INCLUDE LRPFT). 

The member name is derived from the name of the table in the documentation. This state- 
ment will automatically cause the acquisition of both that library member and the declara- 
tion statement itself (from the source library) and will cause the declaration statement to be 
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Figure 2.— Program data involvement. 
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Figure 3.— Preprocessor statements. 


physically inserted into the source code prior to compilation. All data declarations exist in 
only one place, the source library. All programmers use the same declaration from the library, 
and, therefore, changes in the data declaration will be automatically and immediately avail- 
able to all programmers because the latest version appears in their program listing. In this 
way, data names and table identification are consistent. 

Using the full 3 1 characters for naming a variable is awkward for a programmer and can 
lead to keypunching problems. The PL/I compiler helps here as well. In the preprocessor 
pass, before the actual compilation, it is possible to replace items in the source code. This 
feature is used to convert the programmer’s abbreviation of a data variable name to the full 
name. A standard algorithm is used for abbreviating a name in a table: The first three char- 
acters of the first word, an underline character, and then the first character of each succeed- 
ing word. For example, the programmer writes line width as LIN_W, adjustment type as 
ADJ_T, and case as CAS. When he receives the computer listing, all the abbreviated names 
will have been replaced, and his listing will include the full descriptive name that was used 
in the declaration of the tables. The preprocessor code required to accomplish the conver- 
sion from abbreviated name to full name has been included as an addition to the member 
in the source library that contains the declaration of the table (fig. 3). Therefore, when the 
programmer includes the table declaration in his program, he is also including the preproces- 
sor code that will accomplish all the abbreviation conversions for the data elements in the 
table. As a result, the preprocessor code is only prepared once, and the conversion is auto- 
matically performed any time the table is used. This does not mean, however, that the pro- 
grammer cannot use the full version of the name. Either version can be used in this code. 

IMPLEMENTATION 

One man-week was required to code and catalog the declarations for the system library. 
This figure does not include the design of the structures or the keypunch time. Currently, 
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table maintenance requires about 1/2 man-week for each calendar week. When system im- 
plementation began, 15 man-weeks were expended in coding and testing utility routines 
that output the data structures during program debugging. 1 This coding exercise was useful 
for training non-PL/I programmers and as an introduction to the rather complex data 
hierarchy. 

PROBLEMS ENCOUNTERED 

The basic concepts set forth in this paper have been validated by experience, and pro- 
grammers find the structures easy to use. Design inconsistencies and program specification 
shortcomings appear to surface early in implementation, as soon as the program data involve- 
ment is available. However, if the reader intends to use this approach he should be aware of 
the following potential problems: 

(1 ) The name abbreviations in the preprocessor statements must be unique. If they 
are not, the program using the declaration will not pass the preprocessor phase of 
compilation. This problem has caused several days of table lockout in our 
implementation. 

(2) The printout material used by programmers must be updated with every non- 
comment change to the declarations. Failure to do this will cause the program- 
mer to think that he has an error when his program is actually correct. 

(3) A change in a data structure requires that all programs using that structure be 
recompiled. 

(4) Program testing must be suspended while the data structures and related print 
programs are being updated in the library. 

We found it best to update the tables no more frequently than once a month, to use ex- 
treme care, and to anticipate the sacrifice of at least one computer run by every programmer. 

DATA FILE DOCUMENTATION 

The remainder of this report briefly describes the manner in which user-oriented data 
files are documented. Approximately 30 individual files make up the MEDLARS II data 
base. A consistent method is needed for the definition of information carried in these files, 
a method that could be understood by non-data-processing personnel in the user’s facility 
who are interested in the content, but not the structure, of the file and need a great deal of 
information about the files. 

Nine descriptors were devised for every data element in a flie, and these files were docu- 
mented in machine-readable form so that they would be easy to update while the client 
thought about problems and requested changes. The files have been evolving, and the docu- 
mentation has been able to keep abreast. Generally, it takes only a few days to document 
very sweeping changes in the design of the National Library of Medicine’s bibliographic 
files. 


1 The structures are based on, and therefore cannot be output by, the PUT DATA instruction nor can the variables 
be traced with the CHECK function. 
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Figure 4.-Overview of indexed citation file. 


Figure 4 shows the overview of the indexed citation file. The overview describes the 
general content of the file, identifies other files in the system that are referenced by this 
file in some way, and then generally describes the record content. The record content is 
just an overview that is used for a quick introduction to the file. It is followed by a descrip- 
tion of the data within the file. Not shown is a pictorial representation of the file structure 
identifying all the data elements in the file. Each data element in a record is documented by 
punched cards that are numbered to identify the type of information carried. Figure 5 de- 
scribes the numbered data-element listing. There is also a type of card for comments. Several 
small PL/1 programs for formatting and editing the data have been written. A great deal of 
supporting keypunch work is required for implementation of the file, but once the file has 
been established, the data are easy to update. 
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Figure 5.— Data-element listing. 


DISCUSSION 

MEMBER OF THE AUDIENCE: Do you feel that it is a worthwhile effort to attempt 
to document large programs that are continually changing? 

LUEBKE: I think that it is a good idea and should be helpful, as part of the source 
documentation, when the system is finally operational. Having the data within the system 
catalog will permit programmers who did not participate in the development of the system 
to identify the relationships between the programs and the data so that they can perform 
maintenance. 

MEMBER OF THE AUDIENCE: How many programmers are involved in your project? 

LUEBKE: At the present time, we have about 27 programmers working in this area. 

MEMBER OF THE AUDIENCE: For how many years has your project been in 
existence? 

LUEBKE: We began programming in August 1967 and should be finished with that 
phase in the spring of 1 97 1 . 



PANEL DISCUSSION 


MEMBER OF THE AUDIENCE: I feel one of the most pertinent things that Dr. Swift 
said was that using automatic techniques to describe data is one of the most feasible and 
valuable things that can be done. In this last paper, there seems to be a suggestion of how to 
do that. Does the panel agree or disagree, and what are the relative merits of this approach? 

PANEL MEMBER: I agree fully that the independent description of data is something 
that can now be done. 

PANEL MEMBER: Not only description of data but also typing the description of the 
data to the system itself so that the data description entering into the system are basically 
the same ones the programmer works from. 

PANEL MEMBER: I would like to add that besides describing the data, there is a prob- 
lem of analyzing it. It becomes important to know where the data are throughout the pro- 
grams and subprograms and how the data interact with each other after being changed. 

PANEL MEMBER: My feeling is that as serious a problem as the traditional after-the- 
fact documentation is, the most serious part of the problem comes at the beginning of a proj- 
ect, what I call the “upstream documentation.” This is the attempt to record information that 
everybody can understand, work with, and write code from. A considerable amount of docu- 
mentation is brought into existence by going through the various stages of system design, 
from the requirements at the start of the project to the various elements and levels of the 
design approach to, finally, the stage that can be solved by writing code rather than by de- 
veloping a further level of design. 

I think the data processing business has perhaps been somewhat deficient in not putting 
enough emphasis on the total process of supporting and facilitating development of this up- 
stream documentation. If properly done, it should form a body of material that can be coupled 
to the kinds of tools that are capable of being developed now for automatic documentation. 
This upstream documentation, rather than the problem of building from something that is 
already computer processable, is the basic problem. 

PANEL MEMBER: I agree. Until program documentation flows naturally out of the de- 
signing process, it is always going to be a problem. You have to start at the beginning and 
generate most of the documentation in the process of designing a system. Until proper pro- 
cedures are followed, good documentation is never going to happen. 

PANEL MEMBER: Let me add that managers must also face the fact that certain costs 
are going to be incurred and that their project reports are going to reflect cost increases be- 
fore any code will be written. 

PANEL MEMBER: That comment compares the cost of getting the deck as it comes 
out of the computer without any comments to the cost of getting it with BELLFLOW. I think 
the manager should compare what it costs to put BELLFLOW comments in against what it costs 
to put in the comments that should be put in the deck anyway. Often if you insist on doing 
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the job correctly in the first place, some of these other additions do not add much of a 
burden. 

MEMBER OF THE AUDIENCE: The comment that programmers are undisciplined 
has been made several times. I wonder what the panel thinks about this. 

PANEL MEMBER: I have a few random thoughts on the subject. One of the roots of 
the problem lies in programmer education. When programmers are trained, documentation 
very often is not stressed. No organized way to write a program is taught. Programs conse- 
quently reflect the idiosyncrasies of the programmer. But the organization of a program, 
given the same kind of problem, should be standard. Unfortunately, it is not. The solution 
to this part of the problem lies in better training. 

Second, I agree that the only way to get reasonably good documentation is to have the 
documentation developed with the programming. For one thing, it is the only way it will get 
done because programmers are generally working on another project soon after the comple- 
tion of one and have neither the time nor the inclination to document once the program is 
finished. In addition, they may forget certain aspects of the program. So, the only way to 
document is to integrate documentation with normal programming activity. 

MEMBER OF THE AUDIENCE: I think you have to recognize that a computer pro- 
gram is a very difficult thing to describe in the first place. No system tells how to read docu- 
mentation, there is nothing like a flowchart of how to proceed through a particular piece of 
documentation, which may be in narrative form. 

The programmer works directly with a program and has no way of viewing it as a reader. 
I think one of the basic problems in program documentation is that programmers are not 
trained to think of how to make their programs readable to others. Most scientists, let alone 
most programmers, are not trained to write, and this fact must be recognized as we attempt 
to develop tools for automatic documentation. 

Finally, the symbols that we use in flowcharts do not fit together with meaningful ways 
of expressing this information. I think that something ought to be done if we are going to 
have large names in data and procedure names that are 3 1 characters long. I wonder what the 
panel’s comments on this are. 

PANEL MEMBER: The situation in data processing is that the tools of documentation 
and description for computer programs that have been used are reasonably appropriate for 
basic computer program and data processing situations. But they are not really suitable for 
establishing and describing things like multiple-application, multiple-user, on-line, and real- 
time systems of various kinds. It is in many ways rather surprising that we have been able 
to do as well as we have in describing some of these newer situations with tools that were 
developed for an earlier kind of problem. Herein lies one of-the main challenges for 
documentation. 

PANEL MEMBER: I think one important question would be determining the amount of 
documentation that has to be done by people and the amount that can be done by computer. 

PANEL MEMBER: When a building, the hardware part of a computer system, or a 
communication switching network is documented, something that can be seen and either 
agreed or disagreed with is being documented. Documenting a program is documenting an 
idea rather than something physical. It is more difficult to accept the fact that we have to 
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spend money to figure out how to document and convey an idea to somebody than it would 
be for something physical. 

You could show an executive director of a company, who has not been involved in a 
program, documentation of that program, and he would find it very difficult, even if it were 
the best documentation possible, to decide whether it is worth doing or not. 

PANEL MEMBER: I think documentation should be divided into three stages. The 
first stage would be documentation of the planning stage. Then there would be documenta- 
tion of the implementation effort. Once the program is implemented, there would be ter- 
minal documentation. Possibly, this might be the way to approach documentation. I would 
say that the motivation to document is certainly very strong at the beginning of a project. 

By looking at documentation in terms of a kind of life cycle, we certainly would get a great 
deal of this documentation completed at the very beginning. It may be that a lot of our think- 
ing is simply not recorded at the time when it would be easiest. 

PANEL MEMBER: I would like to comment on documentation of the implementation 
design effort. We are trying to record the development of a program so that if modifications 
are to be made to a program, there will be some idea of how complex the procedure of chang- 
ing the program will be. 

MEMBER OF THE AUDIENCE: One of the problems in documentation is that poor 
programs are often written. Very little is known about how to write programs. If you keep 
a program long enough, debug it long enough, and patch it long enough, it finally does every- 
thing it is supposed to do. Then, we often try to document these programs, which should 
not have been written in the first place. 

PANEL MEMBER: I agree with you in one sense, and in another sense I do not. Very 
often what happens when programs are developed is that they are developed for certain 
specifications. The program, however, may be used for many years. If a little more effort 
were put into the program and a certain amount of flexibility were built into the program, 
then as the program changed, there would be less of a problem in adding to the program. 

I know it is difficult to try to foresee what changes may be needed, but I think if, for in- 
stance, the initial investment were increased by 25 percent, more than 100 percent may be 
saved during the life of that system. 

PANEL MEMBER: Another problem is that finiteness is gradually disappearing, es- 
pecially from the larger data-driven systems. There are now so many alternatives that to decide 
what future alternatives are is almost impossible. 

PANEL MEMBER: Documenting programs that should not have been written is a prob- 
lem. The programmer is not always at fault, however. Often he receives a specification that 
does not reflect the final product. 

One company has eliminated this particular hazard by using intermediate personnel 
between the engineers and the programmers. They are familiar with the engineer and his 
field and also understand something about programming. These intermediaries read the 
specifications and translate them for the programmer. 

As far as specifications are concerned, the person who wants the program written should 
know what he wants it to do. It is not enough to know what the input is and what the output 
should be. Good preliminary documentation is needed to write up a correct specification 
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for a programmer. You have to have good preliminary documentation before you can give a 
specification to the programmer because he does what the specification tells him to and that 
is all he can do. 

MEMBER OF THE AUDIENCE: After a program is debugged, it is assumed to operate 
perfectly. That is not true about hardware. In hardware, fault indicators and other safeguards 
are built in so that if a system does not operate properly it stops. The equipment is often 
capable of indicating why the machine is failing. Documentation certainly is often used to 
try to track down problems, but not to the extent it could be. The typical commercial appli- 
cation is developed so that you can check for data errors, but the program will not check to 
see that it itself is performing properly. 

PANEL MEMBER: Many of the things we do, of course, have been done by the systems 
approach but because of various factors including the undependability of Government fund- 
ing, we often start and then stop programming efforts. Another problem is that badly docu- 
mented programs are often inherited. 

MEMBER OF THE AUDIENCE: This brings up an interesting point. I think all the 
panel members have operated under Government contract. What do you think of the specifi- 
cations laid down for programming documentation and the followup action on the part of 
the Government from the start of a program to the delivery of the final product? 

PANEL MEMBER: I believe that all contracts should stipulate that the contractor 
maintain and document his program for a specified length of time after the completion of 
the contract. 

MEMBER OF THE AUDIENCE: How do you handle the problems of having to take 
over a system that is already set up and documenting for others? 

PANEL MEMBER: One case that I happen to know about concerned a defense-oriented 
system of considerable size. It became necessary for another group to take over its mainte- 
nance. In that particular instance, the company worked backwards. They began by specifying 
what the requirements were when the requirements specification was developed. They then 
proceeded to go through the first level of developing the end-item specification, called the 
general design specification. These appeared to be in agreement with what was going on. 

Then the next levels of specifications were produced and checked until, finally, all the speci- 
fications that ought to have been produced in the process of developing the program in the 
first place were written. Some “fudging” brought the actual code into agreement with the 
specifications thus created. Only at that point did it really become possible for the company 
to relax and begin actual maintenance. 

MEMBER OF THE AUDIENCE: One of our principal customers has a requirement 
that many of our programs be sent to COSMIC for further distribution. In the past, we had 
a great deal of difficulty in getting the programs accepted. We solved that problem by setting 
up a review team independent of the original programming group. The review group is com- 
posed of operations, engineering, and programming people that take a program and work 
through the program library, program by program, to see whether it is completely under- 
standable and can be shipped out to someone else with the assurance that it works and can 
be run elsewhere on the same machine. 

The original programming group has a fixed budget for each program sent through the 
review group. If it costs more than the allotted amount to review, the originating cost center 
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gets tabbed for the excess cost. We have not had enough experience to see how this is going 
to work out, but this factor of economic accountability would seem to guarantee its success. 

MEMBER OF THE AUDIENCE: The talk about developing the documentation for the 
life cycle of a program sounds very nice; however, for some programs that is not necessary. 

I think automation can really be useful by keying information on data in different ways. 

1 think there should be a file of information about a program that can be called upon 
when necessary, but there is no need for reams of information that you cannot find your way 
through. The trouble is not having enough documentation but having so much that you can- 
not begin to understand how to use it or how to be able to get into it. There are times when 
you have to change a program in a short period of time. Then you need a certain amount of 
assurance that you have documentation for and know the location of all the data of a cer- 
tain kind for each program. You do not need to have something printed out every time one 
factor in a program is changed. Maybe we should look upon the computer as being an infor- 
mation retrieval system of documents and documentary information about various elements 
of the program and its logic and not let it become a producer of printed documents. 

PANEL MEMBER: I agree that we do not need to print all the necessary documentation. 
We find a tape recorder very useful in documentation in two ways. General descriptions and 
information about the program are recorded but not printed. We keep the tape for reference 
only. Generally, the original designer talks about where you might change the program and 
certain idiosyncrasies of the program. This tape gives a very good picture of how the pro- 
gram was developed and why and where you might have to be very careful if changes have 
to be made. 

We also use'the cassette to record basic information that a programmer should write 
but never gets around to doing. 

MEMBER OF THE AUDIENCE: If you had clear-cut specifications in advance, then 
clearly all you need is somebody to code it. That is one thing. But in advanced theoretical 
research, you probably cannot get clear-cut specifications in advance that will lead to effi- 
cient ways of eventually writing the program. I think that we may lose some creativity in 
programming if we force too many specifications on programmers. 

PANEL MEMBER: 1 would say that 90 percent of all the programming done in the 
United States is not creative. I would also say that there is certainly no reason why one 
should not have complete flexibility in the other 10 percent of the cases. 

PANEL MEMBER: If you are generating a system to be used only once, you do not 
care much about how it is evolved. If you are generating something that is going to be used 
many times, then you should not be so creative that you ignore efficiency and good design. 

MEMBER OF THE AUDIENCE: You mentioned a 25-percent increase in cost to make 
programs easier to handle. How do you decide which programs will persist and thus need 
additional effort to make them meet future requirements? 

PANEL MEMBER: In some situations, for example, a payroll system, it is relatively 
easy to see how the system might have to change in the future as the company changes. In 
other cases, for example, the space program, how the system will change is not so obvious 
but the fact that it will is. So some allowance for change has to be made and is worth the 
extra 25 percent. But even in these obvious cases, little is being done. 
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PANEL MEMBER: It seems to me that almost any program of reasonable size that is 
being developed is worth spending that extra 25 percent on. You may be wrong, and it may 
have only limited life. But if you think that it is going to have long life, you ought to put in 
extra effort. 
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